perm filename A38.TEX[154,RWF] blob sn#816456 filedate 1986-05-02 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00002 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\magnification\magstephalf
C00021 ENDMK
C⊗;
\magnification\magstephalf
\input macro.tex
\def\today{\ifcase\month\or
  January\or February\or March\or April\or May\or June\or
  July\or August\or September\or October\or November\or December\fi
  \space\number\day, \number\year}
\baselineskip 14pt
\rm
\line{\sevenrm a38.tex[154,phy] \today\hfill}

\bigskip
\line{CS 154\hfill Spring 1983}
\line{Prof. R.W. Floyd\hfill}

\bigskip\noindent
If $L$ is a language over $\Sigma↑{\ast}$, it can be used to define two equivalence
relations.  The first relation makes two strings equivalent if they can be
used interchangably anywhere; the second if they can be used interchangably
at the beginnings of strings.  We call the first $\zzap$, and the
second $\zap$.

We define $\zzap$ (infix equivalence) by saying $x\zzap y$ if replacing an $x$ by a
$y$, or a $y$ by an $x$, in any string of $L$, results in a string of $L$; in
a formula
$$(x \zzap y)≡(∀u,v\in\Sigma↑{\ast})(uxv\in L≡uyv\in L).$$
Since replacing $x$ by itself does not change it, $x \zzap x$, and
$\zzap$ is reflexive.  By the symmetry of $≡$ in the definition,
$x \zzap y≡y \zzap x$, and $\zzap$ is symmetric.  If
$x \zzap y$ and $y \zzap z$, then $uxv\in L⊃uyv\in L⊃uzv\in L$, etc., so
$\zzap$ is transitive.

We define $\zap$ (prefix equivalence) by saying $x \zap y$ if replacing an $x$ by a
$y$ or a $y$ by an $x$ 
at the beginning of a string of $L$ results in a string of $L$; 
in a formula, $$(x \zap y)≡(∀v)(xv\in L≡yv\in L).$$
As with $\zzap$, $\zap$ is reflexive, symmetric, and transitive; both 
are equivalence relations; $\zzap$ is often called {\sl congruence}.  They have
some closure properties under concatenation:

\medskip
\disleft 20pt:(1):If $x \zzap y$ and $u \zzap v$, then $xu \zzap yv$.

\smallskip
\disleft 20pt:(2):If $x \zap y$, and $u \zzap v$, then $xu \zap yv$.

\smallskip
\disleft 20pt:(3):If $x \zap y$, and $u\in\Sigma↑{\ast}$, then $xu \zap yu$. 
(A special case of (2).)

\smallskip
\disleft 20pt:(4):If for all $u$, $ux \zap uy$, then $x \zzap y$.

\medskip\noindent
By substituting $\epsilon$ for $u$ in the definition of $\zzap$, we see that
$x \zzap y$ implies $x \zap y$; that is, $\zzap$ is
{\sl stronger\/} than $\zap$, or $\zap$ is {\sl weaker\/} than $\zzap$.

As with all equivalence relations, $\approx$ and $\sim$ (we omit the $L$ if no
ambiguity arises) imply a partition of $\Sigma↑{\ast}$ into equivalence classes.
If $C$ is an equivalence class, and $x$ and $y$ are two members of $C$, then
$x≡y$, so $x≡u$ implies $y≡u$, etc., and the members of $C$ can, for many
purposes, be used interchangably.  
Let $\hbox{Cong}(x)=\{\,y\mid x\approx y\,\}$, and
$\hbox{Eq}(x)=\{\,y\mid x\sim y\,\}$.  Let {\bf E} be the set of equivalence
 classes 
and~{\bf C} the set of congruence classes.  We call the number (finite or
infinite) of equivalence classes the {\sl index\/} of an equivalence relation.

\vfill\eject

\proclaim
Theorem. A language $L$ is an FML if and only if $\zzap$ has finite index.
\par

\medskip\noindent
{\bf Proof.}

\smallskip
(1)\xskip FML implies finite index. Let $M$ be a 
DFM$(Q,\Sigma ,\delta,q↓0,F)$ accepting   
$L$. For any string~$x$ in~$\Sigma↑{\ast}$, 
let $\delta ↓x$ be the function $Q→Q$ defined
by $\delta ↓x(q)=\delta(q,x)$; if $n=|Q|$, there are at most $n↑n$ such functions.
If $\delta ↓x$ and $\delta ↓y$ are the same function, then $x\approx y$; in fact,
$$uxv\in L≡\delta ↓v\bigl(\delta ↓x\bigl(\delta ↓u(q↓0)\bigr)\bigr)
\in F≡\delta ↓v\bigl(\delta ↓y\bigl(\delta ↓u(q↓0)\bigr)\bigr)
\in F≡uyv\in L\,.$$  
If we could find more than $n↑n$ non-equivalent
strings, their transition functions would all be different, which is not possible
(pigeonhole principle).

(2)\xskip Finite index implies FML.  Let $L$ be a language, 
$\zzap$ of finite index.
Define $M=\bigl(\hbox{\bf E},\Sigma,\delta,$
$\hbox{Eq}(\epsilon),\ F\bigr)$, 
where $\delta$  is defined by $\delta\bigl(\hbox{Cong}(x),\ a\bigr)
=\hbox{Cong}(xa)$, and 
$F=\{\,\hbox{Cong}(x)\mid x\in L\,\}$.

\smallskip
\noindent
For the student: confirm that $\delta$ is well-defined, and that $M$ 
accepts~$L$.

\proclaim
Theorem. A language is an FML if and only if $\zap$ has finite index.
\par

\disleft 20pt:(1):FML implies finite index.  Let $M$ be as usual.  
Let $q(x)=\delta(q↓0,x)$.
If $n=|Q|$, there are at most $n$ different values of $q(x)$.  If $q(x)=q(y)$,
then $x\sim y$; in fact
$$xv\in L≡\delta\bigl(q(x),v\bigr)\in F≡\delta\bigl(q(y),v\bigr)\in F≡yv\in L\,.$$
If we could find more than $n$ non-congruent strings, their values of $q(x)$
would all be different, which is not possible.

\disleft 20pt:(2):Finite index implies FML.  Left as an exercise.

\bigskip
\disleft 38pt::
{\bf Exercise.\/}
Define $L↓1/L↓2=\{\,x\mid xy\in L↓1, y\in L↓2\,\}$. Show if $L↓1$ is
regular, so is $L↓1/L↓2$, for {\it any\/} language~$L↓2$.

\proclaim
Theorem. The minimum number of states required by a DFM for $L$ is the
index of $\zap$; left as an exercise.

\bigskip\noindent
{\bf Myhill-Nerode Theorem.}
For $L$ a language, define $f(x)=\{\,y\mid xy\in L\,\}$. If $L$ is
an FML, $\delta(q↓0,x↓1)=\delta(q↓0,x↓2)$ implies $f(x↓1)=f(x↓2)$, and
since $\delta$ has finite range, $f$~has only finitely many values.

Define $x↓1\sim x↓2$ (``prefix equivalence'') if $f(x↓1)=f(x↓2)$;
clearly an equivalence relation. If $x↓1\sim x↓2$, then $x↓1z\sim x↓2z$;
this property is called right invariance. The language~$L$ is the union of
those equivalence classes $E(x)$ for which $\epsilon\in f(x)$. Then if
$L$ is an FML, it is the union of some equivalence classes of a finite
right invariant equivalence relation on strings. (The converse is true
also.) If $f$ has infinite range, no FM can recognize~$L$, since
$x↓1\not\sim x↓2$ requires $\delta(q↓0,x↓1)≠\delta(q↓0,x↓2)$.

\medskip\noindent
{\bf Example.}  Let $L↓1=\{A↑iB↑i,i≥1\}$.  Then $x \zzap y$ if $x=y$, and also if:

\smallskip
\disleft 20pt:(1):both contain $BA$

\smallskip\noindent
or

\smallskip
\disleft 20pt:(2):$x=A↑iB↑j,y=A↑kB↑l,i≥1,j≥1,k≥1,l≥1,$ and $i-j=k-l$.

\smallskip\noindent
Also, $x \zap y$ if $x=y$, and also if

\smallskip
\disleft 20pt:(1):each contains $BA$ or has more $B$'s than~$A$'s.

\smallskip\noindent
or

\smallskip
\disleft 20pt:(2):$x=A↑iB↑j,y=A↑kB↑l,i≥j≥1,k≥l≥1$, and $i-j=k-l$.

\smallskip
Prefix equivalence classes for the infinite language $\{\,L↑iR↑i\mid i≥0\,\}$

\figbox 2truein:

\noindent
States can be labeled with the shortest string in the equivalence class:
$\epsilon$, $0$, $00$, $000$, $\ldots$, $01$, $001$, $0001$, $\ldots$, $010$.

\medskip\noindent
{\bf Example.}  Let $L↓2$ be the set of palindromes over $\{A,B\}$. 
Then $x \zzap y$, and $x \zapp y$, only if $x=y$.

\smallskip\noindent
Languages $L↓1$ and $L↓2$ have infinite index, so they are not FML's.

\medskip\noindent
{\bf Example.}  Let $L↓3$ be the set of strings over $\{A,B\}$ containing $ABA$.
Then $x \zappp y$ if:

\smallskip
\disleft 20pt:(1):Both contain $ABA$ or

\smallskip
\disleft 20pt:(2):Neither contains $ABA$, and each ends in $A$, or

\smallskip
\disleft 20pt:(3):Neither contains $ABA$, and each ends in $AB$ or

\smallskip
\disleft 20pt:(4):Neither contains $ABA$, and each is $\epsilon$, 
or $B$, or ends in $BB$.

\smallskip
\noindent
Thus there are four equivalence classes, corresponding to this minimal $FM$:

\figbox 1.5truein:

\disleft 38pt::
{\bf Hard Exercise:\/} define $\zappp$.

Because $x\approx y$, $u\approx v$ implies $xu\approx yv$, we can define
concatenation on equivalence classes; $\hbox{Cong}(x)\otimes
\hbox{Cong}(y)=\hbox{Cong}(xy)$.

\medskip
\disleft 38pt::
{\bf Exercise\/}: show this uniquely defines $\otimes$. (We will omit 
$\otimes$ when we 
feel like it.)  The infix
equivalence classes of a FML form a finite semigroup under
concatenation (i.e., $\otimes$ is associative), with identity element 
$\hbox{Eq}(\epsilon)$.
Conversely, if $\Sigma$ is a set of elements of a finite semigroup, with 
operation~$\odot$, 
and $S$ is any subset of the semigroup, there is an FM that reads
$a↓1a↓2\cdots a↓n\epsilon\Sigma↑{\ast}$, accepting iff 
$a↓1\odot a↓2\cdots\odot a↓nεS$.
The FM
need only keep track of the semigroup product of the characters it has read.

Summing up, we have a large set of equivalent definitions of FML-ness:

\smallskip
\disleft 20pt:(1):$L$ is accepted by a DFM

\smallskip
\disleft 20pt:(2):$L$ is accepted by a NFM

\smallskip
\disleft 20pt:(3):$L$ is regular (defined by a regular expression)

\smallskip
\disleft 20pt:(4):$L$ has a prefix equivalence relation of finite index

\smallskip
\disleft 20pt:(5):$L$ has an infix equivalence relation of finite index

\smallskip
\disleft 20pt:(6):There is a homomorphism $h$ from $\Sigma↑{\ast}$ to a 
finite semigroup $S$ where
$x\in L$ if and only if $h(x)$ belongs to a particular subset of $S$.

\proclaim
Theorem.    $L$ is an FML if and only if there is a number $n$ such that
every string of length~$n$ is infix (prefix)
equivalent to some shorter string.
\par

\medskip
Thus, for $L$ an FML, we can give a finite list of replacement rules $(x,y)$,
$|x|>|y|$, such that to test whether $z\in L$, we make allowed replacements 
of~$x$ by~$y$ until no
more are possible, and then test for membership in a finite set.

\medskip\noindent
{\bf Example.}   $L↓3$ as defined above.  A sufficient set of replacement rules is:

\noindent
$(BABA,ABA)$

\noindent
$(ABAB,ABA)$

\noindent
$(BBB,BB)$

\noindent
$(AA,A)$

\noindent
$(ABBA,A)$


\proclaim
Corollary. 
If for each $n≥0$ there is a string of length $n$ not infix (prefix)
equivalent for $L$ to
any shorter string, then $L$ is not a FML.
\par

\proclaim
Corollary.
If $L$ is not an FML, then there is an infinite sequence of characters
$a↓1a↓2a↓3\cdots$, for which the finite prefixes $a↓1a↓2\cdots a↓n$ are all prefix
inequivalent for L.  Any program to recognize $L$ must go into a new state
at each character as it reads this sequence.  Then for each $n$, there is some
input of length $n$ that makes it use at least $\lceil \lg (n)\rceil$ 
bits of storage.
\par

\proclaim
Theorem. Two way NFMs accept only regular sets.
\par

\medskip
Let $M$ be a 2-way nondeterministic FM. Define, for every string $x$, a set
of state pairs $P↓{LR}(x)=\{\,(i,j)\mid$ if $M$ enters $x$
on the left in state~$i$, it may leave on the right in state~$j\,\}$.
Similarly, $P↓{LL}(x)=\{\langle i,j\rangle\mid$ if $M$ enters $x$ on
the left in state~$i$, it may leave on the left in state~$j\,\}$;
$P↓{RL}(x)$ and $P↓{RR}(x)$ are analogous for entering on the right. These
relations may be obtained recursively:
$$\eqalign{P↓{LR}(xy)&=P↓{LR}(x)\bigl(P↓{LL}(y)P↓{RR}(x)\bigr)↑{\ast}
R↓{LR}(y)\cr
P↓{LL}(xy)&=P↓{LL}(x)∪P↓{LR}(x)\bigl(P↓{LL}(y)P↓{RR}(x)\bigr)↑{\ast}
R↓{LL}(y)P↓{RL}(x)\,.\cr}$$
A one-way automaton with finite memory for $L↓M$ keeps track, when it
has read~$x$, of $P↓{LL}(x)$, $P↓{LR}(x)$, $P↓{RL}(x)$, and $P↓{RR}(x)$.
When it reads another character~$y$, it uses the above properties to get
$P↓{LL}(xy)$, etc. At the end of the input, $x$~is accepted if 
%$P↓{LR}(x)$
%includes $\langle q↓0,p\rangle$ for some $p$ in~$F$.
$S\circ P↓{LR}(x)\circ F$.


\bigskip
\parindent0pt
\copyright 1983 Robert W. Floyd
First draft (not published)
Spring 1983.
%revised date;
%subsequently revised.

\bye